68 research outputs found

    Feature selection on wide multiclass problems using OVA-RFE

    Get PDF
    Feature selection is a pre–processing technique commonly used with high–dimensional datasets. It is aimed at reducing the dimensionality of the input space, discarding useless or redundant variables, in order to increase the performance and interpretability of models. For multiclass classification problems, recent works suggested that decomposing the multiclass problem in a set of binary ones, and doing the feature selection on the binary problems could be a sound strategy. In this work we combined the well–known Recursive Feature Elimination (RFE) algorithm with the simple One–Vs–All (OVA) technique for multiclass problems, to produce the new OVA–RFE selection method. We evaluated OVA–RFE using wide datasets from genomic and mass– spectrometry analysis, and several classifiers. In particular, we compared the new method with the traditional RFE (applied to a direct multiclass classifier) in terms of accuracy and stability. Our results show that OVA– RFE is no better than the traditional method, which is in opposition to previous results on similar methods. The opposite results are related to a different interpretation of the real number of variables in use by both methods.Fil: Granitto, Pablo Miguel. Erasmus Université Paul Cézanne Aix Marseille III; Francia. Universidad Nacional de Rosario; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Burgos, Andrés. Erasmus Université Paul Cézanne Aix Marseille III; Francia. Universidad Nacional de Rosario; Argentin

    Clustering gene expression data with the PKNNG metric

    Get PDF
    In this work we use the recently introduced PKNNG metric, associated with a simple Hierarchical Clustering (HC) method, to find accurate an stable solution for the clustering of gene expression datasets. On real world problem it is important to evaluate the quality of the clustering proccess. According to this, we use a suitable framework to analyze the stability of the clustering solution obtained by HC + PKNNG. Using an artificial problem and two gene expression datasets, we show that the PKNNG metric gives better solutions than the Euclidean method and that those solutions are stable. Our results show the potential of the association of the PKNNG metric based clustering with the stability analysis for the class discovery process in high throughput dataWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

    Clustering gene expression data with the PKNNG metric

    Get PDF
    In this work we use the recently introduced PKNNG metric, associated with a simple Hierarchical Clustering (HC) method, to find accurate an stable solution for the clustering of gene expression datasets. On real world problem it is important to evaluate the quality of the clustering proccess. According to this, we use a suitable framework to analyze the stability of the clustering solution obtained by HC + PKNNG. Using an artificial problem and two gene expression datasets, we show that the PKNNG metric gives better solutions than the Euclidean method and that those solutions are stable. Our results show the potential of the association of the PKNNG metric based clustering with the stability analysis for the class discovery process in high throughput dataWorkshop de Agentes y Sistemas Inteligentes (WASI)Red de Universidades con Carreras en Informática (RedUNCI

    Feature selection with simple ANN ensembles

    Get PDF
    Feature selection is a well-known pre-processing technique, commonly used with high-dimensional datasets. Its main goal is to discard useless or redundant variables, reducing the dimensionality of the input space, in order to increase the performance and interpretability of models. In this work we introduce the ANN-RFE, a new technique for feature selection that combines the accurate and time-e cient RFE method with the strong discrimination capabilities of ANN ensembles. In particular, we discuss two feature importance metrics that can be used with ANN-RFE: the shu ing and dE metrics. We evaluate the new method using an arti cial example and ve real-world wide datasets, including gene-expression data. Our results suggest that both metrics have equivalent capabilities for the selection of informative variables. ANNRFE seems to produce overall results that are equivalent to previous e cient methods, but can be more accurate on particular datasets.Presentado en el X Workshop Agentes y Sistemas InteligentesRed de Universidades con Carreras en Informática (RedUNCI

    Aggregation algorithms for regression : A comparison with boosting and SVM techniques

    Get PDF
    Classi cation and regression ensembles sho w generalization capabilities that outperform those of single predictors. We present here a further ev aluation of tw o algorithms for ensemble construction recently proposed by us. In particular, we compare them with Boosting and Support Vector Machine tec hniques, which are the newest and most sophisticated methods to treat classi cation and regression problems. We sho w that our comparatively simpler algorithms are very competitive with these tec hniques, showing even a sensible improvement in performance in some of the standard statistical databases used as benchmarks.Eje: Agentes y Sistemas Inteligentes (ASI)Red de Universidades con Carreras en Informática (RedUNCI

    Discussing a new Divisive Hierarchical Clustering algorithm

    Get PDF
    We present DHClus, a new Divisive Hierarchical Clustering algorithm developed to detect clusters with arbitrary shapes. Our algorithm is able to solve clustering problems defined by different scales, i.e. clusters with arbitrarily dissimilar densities, connectivity or between cluster distances. The algorithm not only works under this difficult connditions but it is also able to find the number of clusters automatically. This paper describes this new algorithm and then present results on real gene expression data. We compare the results of DHClus with other algorithms to provide a reference frame.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Aggregation algorithms for regression : A comparison with boosting and SVM techniques

    Get PDF
    Classi cation and regression ensembles sho w generalization capabilities that outperform those of single predictors. We present here a further ev aluation of tw o algorithms for ensemble construction recently proposed by us. In particular, we compare them with Boosting and Support Vector Machine tec hniques, which are the newest and most sophisticated methods to treat classi cation and regression problems. We sho w that our comparatively simpler algorithms are very competitive with these tec hniques, showing even a sensible improvement in performance in some of the standard statistical databases used as benchmarks.Eje: Agentes y Sistemas Inteligentes (ASI)Red de Universidades con Carreras en Informática (RedUNCI

    Discussing a new Divisive Hierarchical Clustering algorithm

    Get PDF
    We present DHClus, a new Divisive Hierarchical Clustering algorithm developed to detect clusters with arbitrary shapes. Our algorithm is able to solve clustering problems defined by different scales, i.e. clusters with arbitrarily dissimilar densities, connectivity or between cluster distances. The algorithm not only works under this difficult connditions but it is also able to find the number of clusters automatically. This paper describes this new algorithm and then present results on real gene expression data. We compare the results of DHClus with other algorithms to provide a reference frame.Sociedad Argentina de Informática e Investigación Operativa (SADIO

    Deep Architectures on Drifting Concepts: A Simple Approach

    Get PDF
    Many real-world problems may vary over time. These non stationary problems have been widely studied in the literature, often called drifting concepts problems. Recently, deep architectures have drawn a growing attention, given that they can easily model functions that are hard to approximate with shallow ones and an effective way of training them have been discovered. In this work we adapt a deep architecture to problems that present concept drift. To this end we show a way of combining them with a widely known drifting concept technique, the Streaming Ensemble Algorithm. We evaluate the new method using appropriate drifting problems and compare its performance with a more traditional approach. The results obtained are promising and show that the proposed variation is effective at combining the expressive power of a deep architecture with the adaptability of SEA.Sociedad Argentina de Informática e Investigación Operativ

    Boosting classifiers for weed seeds identification

    Get PDF
    The identification and classification of seeds are of major technical and economical importance in the agricultural industry. To automate these activities, like in ocular inspection one should consider seed size, shape, color and texture, which can be obtained from seed images. In this work we complement and expand a previous study on the discriminating power of these characteristics for the unique identification of seeds of 57 weed species. In particular, we establish statistical bounds and confidence levels on the results reported in our preliminary study. Furthermore, we discuss the possibility of improving the naïve Bayes and artificial neural network classifiers previously developed in order to avoid the use of color features as classification parameters. Morphological and textural seed characteristics can be obtained from black and white images, which are easier to process and require a cheaper hardware than color ones. To this end we boost the classification methods by means of the AdaBoost.M1 technique, and compare the results obtained with the performance achieved when using color images. We conclude that the improvement in classification accuracy after boosting the naïve Bayes and neural classifiers does not fully compensate the discriminating power of color characteristics. However, it might be enough to make the classifier still acceptable in practical applications.Eje: VisiónRed de Universidades con Carreras en Informática (RedUNCI
    • …
    corecore